Growth of microbial genomes by short segmental duplications
نویسندگان
چکیده
A DNA sequence can be analyzed as a text of four letters by counting the times each word in the set of k-letter words occurs in the text. If the text is random and long enough, then the frequencies of word occurrence are expected to obey a Poisson distribution. Examination of complete microbial genomes shows that for k less than 9, the distribution has a width many times the width of a Poisson distribution 42, 24, 9 and 3.2 times, for k being 2, 4, 6 and 8, respectively. The cause of this phenomenon is not known. Here we propose a simple biologically plausible model for the growth of genomes to explain it: the genome first grows randomly to a length much shorter than its final length, thereafter mainly grows by random segmental duplication. We show that using an initial length of 1000 bases (1 kb) and duplicated segments with lengths averaging 25 b, one can generate a model sequence the size of microbial genomes of the order of 1 Mb that exhibits genomic statistical characteristics.
منابع مشابه
Short Segmental Duplication: Parsimony in the Growth of Microbial Genomes
∗ We show that textual analysis of microbial complete genomes reveals telling footprints of their early evolution. If a DNA sequence considered as a text in its four bases is sufficiently random, the distribution of frequencies of words of a fixed length from the text should be Poissonian. We point out that in reality, for words less than nine letters complete microbial genomes universally have...
متن کاملShort Segmental Duplication: Model for Growth of Microbial Genomes
We show that textual analysis of microbial complete genomes reveals telling footprints of their early evolution. If a DNA sequence considered as a text in its four bases is sufficiently random, the distribution of frequencies of words of a fixed length from the text should be Poissonian. We point out that in reality, for words less than nine letters complete microbial genomes universally have d...
متن کاملUniversal Lengths in Microbial Genomes and Implication for Early Genome Growth
We report the discovery of a set of universal lengths that characterize all microbial complete genomes. The Shannon information [Shannon 1948] of 108 complete microbial genomes relative to those of their respective randomized counterparts are computed and the results are summarized in a two-parameter exponential relation: Lr(k) = (42± 21)× 2.64, 2 ≥ k ≥ 10, where Lr is a ”root-sequence length” ...
متن کاملEvidence for Growth of Microbial Genomes by Short Segmental Duplications
We show that textual analysis of microbial genomes reveal telling footprints of the early evolution of the genomes. The frequencies of word occurrence of random DNA sequences considered as texts in their four nucleotides are expected to obey Poisson distributions. It is noticed that for words less than nine letters the average width of the distributions for complete microbial genomes is many ti...
متن کاملMinimal model for genome evolution and growth.
Textual analysis of typical microbial genomes reveals that they have the statistical characteristics of a DNA sequence of a much shorter length. This peculiar property supports an evolutionary model in which a genome evolves by random mutation but primarily grows by random segmental duplication. That genomes grew mostly by duplication is consistent with the observation that repeat sequences in ...
متن کامل